ROCsearch - An ROC-Guided Search Strategy for Subgroup Discovery

نویسندگان

  • Marvin Meeng
  • Wouter Duivesteijn
  • Arno J. Knobbe
چکیده

Subgroup Discovery (SD) aims to find coherent, easy-to-interpret subsets of the dataset at hand, where something exceptional is going on. Since the resulting subgroups are defined in terms of conditions on attributes of the dataset, this data mining task is ideally suited to be used by non-expert analysts. The typical SD approach uses a heuristic beam search, involving parameters that strongly influence the outcome. Unfortunately, these parameters are often hard to set properly for someone who is not a data mining expert; correct settings depend on properties of the dataset, and on the resulting search landscape. To remove this potential obstacle for casual SD users, we introduce ROCsearch [1], a new ROC-based beam search variant for Subgroup Discovery. On each search level of the beam search, ROCsearch analyzes the intermediate results in ROC space to automatically determine a sensible search width for the next search level. Thus, beam search parameter setting is taken out of the domain expert’s hands, lowering the threshold for using Subgroup Discovery. Also, ROCsearch automatically adapts its search behavior to the properties and resulting search landscape of the dataset at hand. Aside from these advantages, we also show that ROCsearch is an order of magnitude more efficient than traditional beam search, while its results are equivalent and on large datasets even better than traditional beam search results.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

ROCsearch in a Wider Context — A ROC-Guided Search Strategy for Subgroup Discovery and Beyond

ROCsearch is a ROC-based beam search variant, initially developed for Subgroup Discovery (SD). In ordinary beam search, on each search level, a fixed number of best-scoring candidates are selected to generate candidates for the next search level. This fixed number, the beam width, is typically hard to set, and its setting strongly influences the outcome of the mining process. In ROCsearch, howe...

متن کامل

Expert-Guided Subgroup Discovery: Methodology and Application

This paper presents an approach to expert-guided subgroup discovery. The main step of the subgroup discovery process, the induction of subgroup descriptions, is performed by a heuristic beam search algorithm, using a novel parametrized definition of rule quality which is analyzed in detail. The other important steps of the proposed subgroup discovery process are the detection of statistically s...

متن کامل

Analysis of Example Weighting in Subgroup Discovery by Comparison of Three Algorithms on a Real-life Data Set

This paper investigates the implications of example weighting in subgroup discovery by comparing three state-of-the-art subgroup discovery algorithms, APRIORI-SD, CN2-SD, and SubgroupMiner on a real-life data set. While both APRIORI-SD and CN2-SD use example weighting in the process of subgroup discovery, SubgroupMiner does not. Moreover, APRIORI-SD uses example weighting in the post-processing...

متن کامل

A GUIDED TABU SEARCH FOR PROFILE OPTIMIZATION OF FINITE ELEMENT MODELS

In this paper a Guided Tabu Search (GTS) is utilized for optimal nodal ordering of finite element models (FEMs) leading to small profile for the stiffness matrices of the models. The search strategy is accelerated and a graph-theoretical approach is used as guidance. The method is evaluated by minimization of graph matrices pattern equivalent to stiffness matrices of finite element models. Comp...

متن کامل

Rule induction for subgroup discovery with CN2-SD

Rule learning is typically used in solving classification and prediction tasks. However, learning of classification rules can be adapted also to subgroup discovery. This paper shows how this can be achieved by modifying the CN2 rule learning algorithm. Modifications include a new covering algorithm (weighted covering algorithm), a new search heuristic (weighted relative accuracy), probabilistic...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014